AITopics | stable training

ac1dd209cbcc5e5d1c6e28598e8cbbe8-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-13-2026, 12:57:35 GMT

invertible layer, invertible learning, reviewer 1, (13 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.52)

Technology: Information Technology > Artificial Intelligence (0.72)

Add feedback

ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

Neural Information Processing SystemsDec-26-2025, 23:35:15 GMT

In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably sensitive to perturbed input, and predicts an output distant from the desired output, yielding oscillatory loss and thus oscillatory gradient. Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness. Finally, inspired by our theory, we propose an effective coefficient scaling framework ScaleLong that scales the coefficients of LSC in UNet and better improve the training stability of UNet. Experimental results on CIFAR10, CelebA, ImageNet and COCO show that our methods are superior to stabilize training, and yield about 1.5x training acceleration on different diffusion models with UNet or UViT backbones.

diffusion model, scalelong, unet, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training

Neural Information Processing SystemsDec-24-2025, 21:28:05 GMT

Conditional Generative Adversarial Networks (cGAN) generate realistic images by incorporating class information into GAN. While one of the most popular cGANs is an auxiliary classifier GAN with softmax cross-entropy loss (ACGAN), it is widely known that training ACGAN is challenging as the number of classes in the dataset increases. ACGAN also tends to generate easily classifiable samples with a lack of diversity. In this paper, we introduce two cures for ACGAN. First, we identify that gradient exploding in the classifier can cause an undesirable collapse in early training, and projecting input vectors onto a unit hypersphere can resolve the problem. Second, we propose the Data-to-Data Cross-Entropy loss (D2D-CE) to exploit relational information in the class-labeled dataset. On this foundation, we propose the Rebooted Auxiliary Classifier Generative Adversarial Network (ReACGAN). The experimental results show that ReACGAN achieves state-of-the-art generation results on CIFAR10, Tiny-ImageNet, CUB200, and ImageNet datasets. We also verify that ReACGAN benefits from differentiable augmentations and that D2D-CE harmonizes with StyleGAN2 architecture.

auxiliary classifier gan, name change, rebooting acgan, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Improved Training of Wasserstein GANs

Neural Information Processing SystemsNov-21-2025, 15:47:32 GMT

Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only poor samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models with continuous generators. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.

improved training, name change, wasserstein gan, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.65)
Information Technology > Artificial Intelligence > Natural Language (0.61)

Add feedback

out that the paper is "well-written", reviewer 4 says that " the paper has an originality " and notices the "very nice

Neural Information Processing SystemsAug-19-2025, 23:09:30 GMT

Previous "learning to infer" models were only able to perform reconstruction on 2d slices.

invertible layer, invertible learning, reviewer 4, (13 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.52)

Technology: Information Technology > Artificial Intelligence (0.72)

Add feedback

aac02401755a65904cf977a33136af4a-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 13:39:15 GMT

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.32)

Add feedback

ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

Neural Information Processing SystemsJan-20-2025, 00:27:55 GMT

In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably sensitive to perturbed input, and predicts an output distant from the desired output, yielding oscillatory loss and thus oscillatory gradient.

diffusion model, scaling network long skip connection, unet, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training

Neural Information Processing SystemsJan-19-2025, 02:18:49 GMT

Conditional Generative Adversarial Networks (cGAN) generate realistic images by incorporating class information into GAN. While one of the most popular cGANs is an auxiliary classifier GAN with softmax cross-entropy loss (ACGAN), it is widely known that training ACGAN is challenging as the number of classes in the dataset increases. ACGAN also tends to generate easily classifiable samples with a lack of diversity. In this paper, we introduce two cures for ACGAN. First, we identify that gradient exploding in the classifier can cause an undesirable collapse in early training, and projecting input vectors onto a unit hypersphere can resolve the problem.

acgan, auxiliary classifier gan, rebooting acgan, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.65)

Add feedback

Towards stable training of parallel continual learning

Yuepan, Li, Lyu, Fan, Li, Yuyang, Feng, Wei, Liu, Guangcan, Shang, Fanhua

arXiv.org Artificial IntelligenceJul-11-2024

Parallel Continual Learning (PCL) tasks investigate the training methods for continual learning with multi-source input, where data from different tasks are learned as they arrive. PCL offers high training efficiency and is well-suited for complex multi-source data systems, such as autonomous vehicles equipped with multiple sensors. However, at any time, multiple tasks need to be trained simultaneously, leading to severe training instability in PCL. This instability manifests during both forward and backward propagation, where features are entangled and gradients are conflict. This paper introduces Stable Parallel Continual Learning (SPCL), a novel approach that enhances the training stability of PCL for both forward and backward propagation. For the forward propagation, we apply Doubly-block Toeplit (DBT) Matrix based orthogonality constraints to network parameters to ensure stable and consistent propagation. For the backward propagation, we employ orthogonal decomposition for gradient management stabilizes backpropagation and mitigates gradient conflicts across tasks. By optimizing gradients by ensuring orthogonality and minimizing the condition number, SPCL effectively stabilizing the gradient descent in complex optimization tasks. Experimental results demonstrate that SPCL outperforms state-of-the-art methjods and achieve better training stability.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2407.08214

Country:

North America > United States (0.14)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

Stability-Aware Training of Neural Network Interatomic Potentials with Differentiable Boltzmann Estimators

Raja, Sanjeev, Amin, Ishan, Pedregosa, Fabian, Krishnapriyan, Aditi S.

arXiv.org Artificial IntelligenceFeb-21-2024

Neural network interatomic potentials (NNIPs) are an attractive alternative to ab-initio methods for molecular dynamics (MD) simulations. However, they can produce unstable simulations which sample unphysical states, limiting their usefulness for modeling phenomena occurring over longer timescales. To address these challenges, we present Stability-Aware Boltzmann Estimator (StABlE) Training, a multimodal training procedure which combines conventional supervised training from quantum-mechanical energies and forces with reference system observables, to produce stable and accurate NNIPs. StABlE Training iteratively runs MD simulations to seek out unstable regions, and corrects the instabilities via supervision with a reference observable. The training procedure is enabled by the Boltzmann Estimator, which allows efficient computation of gradients required to train neural networks to system observables, and can detect both global and local instabilities. We demonstrate our methodology across organic molecules, tetrapeptides, and condensed phase systems, along with using three modern NNIP architectures. In all three cases, StABlE-trained models achieve significant improvements in simulation stability and recovery of structural and dynamic observables. In some cases, StABlE-trained models outperform conventional models trained on datasets 50 times larger. As a general framework applicable across NNIP architectures and systems, StABlE Training is a powerful tool for training stable and accurate NNIPs, particularly in the absence of large reference datasets. Molecular dynamics (MD) simulation is a staple method of computational science, enabling high-resolution spatiotemporal modeling of atomistic systems throughout biology, chemistry, and materials science [21]. Under the Born-Oppenheimer approximation, system evolution is governed by the underlying potential energy surface (PES), which is a function of the nuclear Cartesian coordinates [11]. While the atomic forces needed for MD simulation can be obtained on-the-fly via ab-initio quantum-mechanical (QM) calculations [12], the unfavorable scaling of this approach makes it prohibitively expensive for realistic system sizes and timescales [22]. There is a long history of using machine learning (ML) approaches in place of ab-initio methods to efficiently approximate the global PES [7, 6, 2, 55]. NNIPs, typically parameterized as graph neural networks [56, 33], are trained by matching energy and forces of a molecule or material from a reference dataset of QM calculations, such as Density Functional Theory (DFT) [31]. NNIPs trained on large ab-initio datasets are increasingly being used to model challenging and important chemical systems with favorable results [45, 37, 15, 57, 64, 43, 14, 3, 36, 60, 26, 19].

md simulation, simulation, stable training, (14 more...)

arXiv.org Artificial Intelligence

2402.13984

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry:

Energy (1.00)
Government > Regional Government (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Filters

Collaborating Authors

stable training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ac1dd209cbcc5e5d1c6e28598e8cbbe8-AuthorFeedback.pdf

ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training

Improved Training of Wasserstein GANs

out that the paper is "well-written", reviewer 4 says that " the paper has an originality " and notices the "very nice

aac02401755a65904cf977a33136af4a-Supplemental-Conference.pdf

ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training

Towards stable training of parallel continual learning

Stability-Aware Training of Neural Network Interatomic Potentials with Differentiable Boltzmann Estimators